09.01.2024
Prowadzący: dr hab. inż. Artur Gramacki, prof. UZ
Grupa: 31E-B-SP/B
Miłosz Szymczak 106834@stud.uz.zgora.pl Tomasz Szytuła 106835@stud.uz.zgora.pl Kuba Siupa 106832@stud.uz.zgora.pl
``
library(tm)
## Warning: pakiet 'tm' został zbudowany w wersji R 4.3.2
## Warning: pakiet 'NLP' został zbudowany w wersji R 4.3.1
library(plotly)
## Warning: pakiet 'plotly' został zbudowany w wersji R 4.3.2
file_path <- "book_ang.txt"
text <- (readLines(file_path, warn = FALSE,encoding = "UTF-8"))
text <- text[text != ""] #usuwanie pustych lini
corpus <- Corpus(VectorSource(text))
head(corpus$content[1])
## [1] "In September 2012, the United Kingdom's competition authority, the Office of Fair Trading (OFT), issued a statement of objections against Booking.com, Expedia, and IHG Army Hotels alleging that Booking.com and Expedia had entered into separate arrangements with IHG which restricted the online travel agent's ability to discount the price of room only hotel accommodation. Booking.com, Expedia and IHG proposed the OFT to change their restrictions. The OFT accepted the proposal, but it was later rejected by higher authority at a tribunal."
corpus <- tm_map(corpus, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(corpus, content_transformer(tolower)):
## transformation drops documents
head(corpus$content[1])
## [1] "in september 2012, the united kingdom's competition authority, the office of fair trading (oft), issued a statement of objections against booking.com, expedia, and ihg army hotels alleging that booking.com and expedia had entered into separate arrangements with ihg which restricted the online travel agent's ability to discount the price of room only hotel accommodation. booking.com, expedia and ihg proposed the oft to change their restrictions. the oft accepted the proposal, but it was later rejected by higher authority at a tribunal."
corpus <- tm_map(corpus, removePunctuation)
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents
head(corpus$content[1])
## [1] "in september 2012 the united kingdoms competition authority the office of fair trading oft issued a statement of objections against bookingcom expedia and ihg army hotels alleging that bookingcom and expedia had entered into separate arrangements with ihg which restricted the online travel agents ability to discount the price of room only hotel accommodation bookingcom expedia and ihg proposed the oft to change their restrictions the oft accepted the proposal but it was later rejected by higher authority at a tribunal"
corpus <- tm_map(corpus, removeNumbers)
## Warning in tm_map.SimpleCorpus(corpus, removeNumbers): transformation drops
## documents
head(corpus$content[1])
## [1] "in september the united kingdoms competition authority the office of fair trading oft issued a statement of objections against bookingcom expedia and ihg army hotels alleging that bookingcom and expedia had entered into separate arrangements with ihg which restricted the online travel agents ability to discount the price of room only hotel accommodation bookingcom expedia and ihg proposed the oft to change their restrictions the oft accepted the proposal but it was later rejected by higher authority at a tribunal"
corpus <- tm_map(corpus, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(corpus, removeWords, stopwords("english")):
## transformation drops documents
head(corpus$content[1])
## [1] " september united kingdoms competition authority office fair trading oft issued statement objections bookingcom expedia ihg army hotels alleging bookingcom expedia entered separate arrangements ihg restricted online travel agents ability discount price room hotel accommodation bookingcom expedia ihg proposed oft change restrictions oft accepted proposal later rejected higher authority tribunal"
corpus <- tm_map(corpus, stripWhitespace)
## Warning in tm_map.SimpleCorpus(corpus, stripWhitespace): transformation drops
## documents
head(corpus$content[1])
## [1] " september united kingdoms competition authority office fair trading oft issued statement objections bookingcom expedia ihg army hotels alleging bookingcom expedia entered separate arrangements ihg restricted online travel agents ability discount price room hotel accommodation bookingcom expedia ihg proposed oft change restrictions oft accepted proposal later rejected higher authority tribunal"
dtm <- DocumentTermMatrix(corpus)
matrix_dtm <- as.matrix(dtm)
svd_result <- svd(matrix_dtm)
u <- svd_result$u
d <- svd_result$d
u1 <- u[, 1]
d1 <- d[1]
sentence_scores <- u1 * d1
top_sentences_indices <- order(sentence_scores, decreasing = FALSE)[1:10]
top_sentences <- text[top_sentences_indices]
list(top_sentences)
## [[1]]
## [1] "In September 2012, the United Kingdom's competition authority, the Office of Fair Trading (OFT), issued a statement of objections against Booking.com, Expedia, and IHG Army Hotels alleging that Booking.com and Expedia had entered into separate arrangements with IHG which restricted the online travel agent's ability to discount the price of room only hotel accommodation. Booking.com, Expedia and IHG proposed the OFT to change their restrictions. The OFT accepted the proposal, but it was later rejected by higher authority at a tribunal."
## [2] "In 2023, The BBC's Watchdog discovered that guests had been contacted by fraudsters over the official Booking.com messaging system, spoof emails, and WhatsApp resulting in financial loss and leaked customer data. Guests complained that it was very difficult to contact Booking.com about this issue, citing poor customer service. The fraudsters direct guests to replicas of the Booking.com website containing the reservation data and personal details of the guest and ask them to make a payment, a temporary transfer of funds or card validation."
## [3] "In April 2015, French, Swedish and Italian competition authorities accepted a proposal by Booking.com to drop its \"rate parity\" clause and thereby allow competitor travel agents to offer lower hotel prices than Booking.com. Booking.com further agreed to extend and apply its proposal across all EU states. Hotels are still prevented from discounting prices directly on their own websites."
## [4] "In November 2014, it was revealed that criminals were able to obtain customer details from the website. Booking.com said it was countering the fraudsters and refunding customers from the UK, US, France, Italy, the UAE, and Portugal, all of which had been affected. Since the fraud, Booking.com has made changes so data can only be accessed from a computer linked to the hotel's server. Its teams have also worked to \"takedown\" dozens of phishing sites, as well as working with some banks to freeze the money mule bank accounts."
## [5] "In April 2020, Booking.com drew criticism when it applied for government aid from the Dutch government's relief program for business affected by the COVID-19 pandemic, while paying billions to shareholders, with $6.3 billion in cash on its balance sheet.[46] In response, on May 22, Booking.com announced that it would not seek further wage subsidies from the Dutch government, and instead look for long term answers. The company laid off 25% of its global workforce."
## [6] "The Hungarian Competition Authority (GVH) found it necessary to request an expedited investigation against the Booking.com regarding their undergoing debt case toward the Hungarian accommodation providers. At the same time the Hungarian Tourism Agency (MTÜ) offered legal aid to the ones affected in the matter and sent a questionnaire to Hungarian accommodation providers to assess the extent of the problem."
## [7] "In March 2017, a Turkish court halted activities of Booking.com in Turkey due to a violation of Turkish competition law in a case filed by the Turkish Association of Travel Agents (TÜRSAB). The ruling blocked the website in Turkey; however, website and application can be used from foreign countries to make reservations for hotels in Turkey."
## [8] "In 2019, following dialogue with the European Commission and national consumer (CPC) authorities, Booking.com committed to ensuring that marketing statements regarding; time-limited offers, the amount of rooms available to book, price comparisons, and the type of vendor offering the accommodation was made clearer to consumers. Changes were also made to make sure that sponsored listings were flagged and that the total price was presented to consumers."
## [9] "In November 2022, Salt Labs discovered flaws in the login process of Booking.com. The flaws could have enabled a bad actor to take over guest accounts.Salt Labs note that Booking.com resolved the vulnerability promptly."
## [10] "Glenn Fogel, CEO of Booking.com, apologized in a letter on 7th November to those hosts who were affected by the payment scandal and have not received their money in time. It is more than interesting that Fogel wrote: \"if you are a partner still awaiting payment, and we have not contacted you regarding this, please inform us at\" a given e-mail address in order to solve the issue."
v1 <- svd_result$v[, 1]
top_terms_indices <- order(v1, decreasing = FALSE)[1:10]
top_terms <- colnames(matrix_dtm)[top_terms_indices]
list(top_terms)
## [[1]]
## [1] "bookingcom" "hotels" "expedia" "oft"
## [5] "ihg" "data" "authority" "guests"
## [9] "competition" "accommodation"
sentences_df <- data.frame(Sentence = text, Score = sentence_scores)
W naszym projekcie, wykorzystując język R, przeprowadziliśmy analizę tekstu z pliku “book_ang.txt”. Przetworzyliśmy ten tekst, tworząc macierz termów-dokumentów oraz przeprowadzając Singular Value Decomposition (SVD). Następnie, wykorzystując wyniki analizy, wyróżniliśmy 10 najważniejszych zdań oraz 10 kluczowych słów w tekście. Ostatecznie, stworzyliśmy interaktywny wykres słów, co pozwoliło nam wizualizować i lepiej zrozumieć istotne treści zawarte w tekście.